unseen task
_NeurIPS_2022__On_the_Effectiveness_of_Fine_tuning_Versus_Meta_reinforcement_Learning (1)
Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and If you ran experiments... (a) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? Please refer to both main text and appendix for experiment details. Did you report error bars (e.g., with respect to the random seed after running experiments multiple All adaptation experiments in Procgen and RLBench are run for 3 seeds. Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal As stated in section 2, we use RTX A5000 GPUs each with 24GB memory. C2F-ARM algorithm and training framework are built based on the original author's implementation Did you mention the license of the assets?
- North America > United States > Montana (0.04)
- North America > Canada > British Columbia (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- North America > United States > California (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Dominican Republic (0.04)
- (2 more...)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
Decomposed Prompt Decision Transformer for Efficient Unseen Task Generalization
Multi-task offline reinforcement learning aims to develop a unified policy for diverse tasks without requiring real-time interaction with the environment. Recent work explores sequence modeling, leveraging the scalability of the transformer architecture as a foundation for multi-task learning. Given the variations in task content and complexity, formulating policies becomes a challenging endeavor, requiring careful parameter sharing and adept management of conflicting gradients to extract rich cross-task knowledge from multiple tasks and transfer it to unseen tasks. In this paper, we propose the Decomposed Prompt Decision Transformer (DPDT) that adopts a two-stage paradigm to efficiently learn prompts for unseen tasks in a parameter-efficient manner. We incorporate parameters from pre-trained language models (PLMs) to initialize DPDT, thereby providing rich prior knowledge encoded in language models. During the decomposed prompt tuning phase, we learn both cross-task and task-specific prompts on training tasks to achieve prompt decomposition. In the test time adaptation phase, the cross-task prompt, serving as a good initialization, were further optimized on unseen tasks through test time adaptation, enhancing the model's performance on these tasks. Empirical evaluation on a series of Meta-RL benchmarks demonstrates the superiority of our approach.